Systems and methods for modifying labeled content

ABSTRACT

Systems and methods are disclosed for modifying labeled target content for a capture device. A computer-implemented method may use a computer system that includes non-transient electronic storage, a graphical user interface, and one or more physical computer processors. The computer-implemented method may include: obtaining labeled target content, the labeled target content including one or more facial features that have been labeled; modifying the labeled target content to match dynamically captured content from a first capture device to generate modified target content; and storing the modified target content. The dynamically captured content may include the one or more facial features.

TECHNICAL FIELD

The present disclosure relates generally to modifying labeled content.

BRIEF SUMMARY OF THE DISCLOSURE

Embodiments of the disclosure are directed to systems and methods formodifying labeled content.

In accordance with the technology described herein, a method may beimplemented in a computer system that may include non-transientelectronic storage, one or more physical computer processors, and agraphical user interface. The computer-implemented method may includeone or more operations. One operation may include obtaining, from thenon-transient electronic storage, labeled target content that mayinclude one or more facial features that have been labeled. Oneoperation may include modifying, with one or more physical computerprocessors, the labeled target content to match dynamically capturedcontent from a first capture device to generate modified target content.The dynamically captured content may include the one or more facialfeatures. One operation may include storing, in the non-transientelectronic storage, the modified target content.

In embodiments, the one or more facial features may include one or moremouth features.

In embodiments, modifying the labeled target content may includecropping, with the one or more physical computer processors, the labeledtarget content such that the one or more mouth features are in croppedtarget content. Modifying the labeled target content may includewarping, with the one or more physical computer processors, the croppedtarget content to match the dynamically captured content correspondingto the one or more mouth features.

In embodiments, the one or more facial features may also include one ormore eye features.

In embodiments, modifying the labeled target content may includerotating, with the one or more physical computer processors, the labeledtarget content based on an angle of the dynamically captured content togenerate rotated target content. Modifying the labeled target contentmay include cropping, with the one or more physical computer processors,the rotated target content such that the one or more eye features are incropped target content. Modifying the labeled target content may includewarping, with the one or more physical computer processors, the croppedtarget content to match the dynamically captured content correspondingto the one or more eye features.

In embodiments, the computer-implemented method may further includegenerating, with the one or more physical computer processors, the oneor more facial features in the dynamically captured content using themodified target content.

In embodiments, generating the one or more facial features may includeestimating, with the one or more physical computer processors, one ormore bounding boxes around parts of the face in the dynamically capturedcontent using one or more facial parameters in the dynamically capturedcontent to generate the one or more facial features from the dynamicallycaptured content. The one or more facial parameters may include one ormore of a color, a curve, and a reflected light intensity. Generatingthe one or more facial features may include generating, with the one ormore physical computer processors, the one or more facial features inthe dynamically captured content using the one or more bounding boxes.Generating the one or more facial features may include identifying, withthe one or more physical computer processors, a closed eye when a givenimage in converted captured content is within a first threshold range.The converted captured content may be derived from the dynamicallycaptured content. Generating the one or more facial features may includegenerating, with the one or more physical computer processors, aposition of a pupil when a portion of the given image in the convertedcaptured content is within a second threshold range.

In embodiments, the computer-implemented method may further includedynamically generating, with the one or more physical computerprocessors, a representation of a face using visual effects to depictthe one or more facial features, and displaying, via the graphical userinterface, the representation.

In embodiments, the first capture device may be different from a secondcapture device used to capture the labeled target content such that thedynamically captured content is distorted differently than the labeledtarget content.

In embodiments, modifying the labeled target content may includeconverting, with the one or more physical computer processors, thelabeled target content into converted content, wherein the convertedcontent uses a different color format than the labeled target content.

In embodiments, the capture device may include a head mounted display.The head mounted display may include a red-green-blue camera capturingone or more mouth features. The head mounted display may include aninfrared camera capturing one or more eye features. The head mounteddisplay may include an infrared illuminator capturing the one or moreeye features.

In accordance with additional aspects of the present disclosure, amethod for modifying labeled target content for a first capture devicemay be implemented in a computer system that may include non-transientelectronic storage, one or more physical computer processors, and agraphical user interface. The computer-implemented method may includeone or more operation. One operation may include obtaining, from thenon-transient electronic storage, the modified target content. Themodified target content may match a distortion of the first capturedevice. One operation may include generating, with the one or morephysical computer processors, the one or more facial features in thedynamically captured content using the modified target content. Oneoperation may include dynamically generating, with the one or morephysical computer processors, a representation of a face using visualeffects to depict one or more of the one or more facial features and thechanges to the one or more facial features. One operation may includedisplaying, via the graphical user interface, the representation.

In embodiments, modifying the labeled target content may includecropping, with the one or more physical computer processors, the labeledtarget content such that the one or more mouth features are in croppedtarget content. Modifying the labeled target content may includewarping, with the one or more physical computer processors, the croppedtarget content to match the dynamically captured content correspondingto the one or more mouth features.

In embodiments, modifying the labeled target content may includerotating, with the one or more physical computer processors, the labeledtarget content based on an angle of the dynamically captured content togenerate rotated target content. Modifying the labeled target contentmay include cropping, with the one or more physical computer processors,the rotated target content such that the one or more eye features are incropped target content. Modifying the labeled target content may includewarping, with the one or more physical computer processors, the croppedtarget content to match the dynamically captured content correspondingto the one or more eye features.

In embodiments, generating the one or more facial features may includeestimating, with the one or more physical computer processors, one ormore bounding boxes around parts of the face in the dynamically capturedcontent using one or more facial parameters in the dynamically capturedcontent to identify the one or more facial features from the dynamicallycaptured content. The one or more facial parameters may include one ormore of a color, a curve, and a reflected light intensity. Generatingthe one or more facial features may include generating, with the one ormore physical computer processors, the one or more facial features inthe dynamically captured content using the one or more bounding boxes.Generating the one or more facial features may include identifying, withthe one or more physical computer processors, a closed eye when a givenimage in converted captured content is within a first threshold range.The converted captured content may be derived from the dynamicallycaptured content. Generating the one or more facial features may includegenerating, with the one or more physical computer processors, aposition of a pupil when a portion of the given image in the convertedcaptured content is within a second threshold range.

In embodiments, the capture device may include a head mounted display.The head mounted display may include a red-green-blue camera capturingone or more mouth features. The head mounted display may include aninfrared camera capturing one or more eye features. The head mounteddisplay may include an infrared illuminator capturing the one or moreeye features.

In accordance with addition aspects of the present disclosure, a systemto modify labeled target content. The system may include a non-transientelectronic storage, a graphical user interface, and one or more physicalcomputer processors. The one or more physical computer processors may beconfigured by machine-readable instructions to perform a number ofoperations. One such operation is to obtain, from the non-transientelectronic storage, labeled target content that may include one or morefacial features that have been labeled. Another such operation is tomodify, with the one or more physical computer processors, the labeledtarget content to match dynamically captured content from a firstcapture device to generate modified target content. The dynamicallycaptured content may include the one or more facial features. Anothersuch operation is to store, in the non-transient electronic storage, themodified target content.

In embodiments, modifying the labeled target content may includecropping, with the one or more physical computer processors, the labeledtarget content such that the one or more mouth features are in croppedtarget content. Modifying the labeled target content may includewarping, with the one or more physical computer processors, the croppedtarget content to match the dynamically captured content correspondingto the one or more mouth features.

In embodiments, modifying the labeled target content may includerotating, with the one or more physical computer processors, the labeledtarget content based on an angle of the dynamically captured content togenerate rotated target content. Modifying the labeled target contentmay include cropping, with the one or more physical computer processors,the rotated target content such that one or more eye features are incropped target content. Modifying the labeled target content may includewarping, with the one or more physical computer processors, the croppedtarget content to match the dynamically captured content correspondingto the one or more eye features.

In embodiments, the one or more physical computer processors are furtherconfigured by the machine-readable instructions to perform a number ofoperations. One such operation is to estimate, with the one or morephysical computer processors, one or more bounding boxes around parts ofthe face in the dynamically captured content using one or more facialparameters in the dynamically captured content to identify the one ormore facial features from the dynamically captured content. The one ormore facial features may include one or more of a color, a curve, and areflected light intensity. Another such operation is to generate, withthe one or more physical computer processors, the one or more facialfeatures in the dynamically captured content using the one or morebounding boxes. Another such operation is to identify, with the one ormore physical computer processors, a closed eye when a given image inconverted captured content is within a first threshold range. Theconverted captured content may be derived from the dynamically capturedcontent. Another such operation is to generate, with the one or morephysical computer processors, a position of a pupil when a portion ofthe given image in the converted captured content is within a secondthreshold range.

Other features and aspects of the disclosed technology will becomeapparent from the following detailed description, taken in conjunctionwith the accompanying drawings, which illustrate, by way of example, thefeatures in accordance with embodiments of the disclosure. The summaryis not intended to limit the scope of the claimed disclosure, which isdefined solely by the claims attached hereto.

BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application filed contains at least one drawing executedin color. Copies of this patent or patent application publication withcolor drawing(s) will be provided by the Office upon request and paymentof the necessary fee.

Further aspects of the present disclosure will be more readilyappreciated upon review of the detailed description of the variousdisclosed embodiments, described below, when taken in conjunction withthe accompanying figures.

FIG. 1 illustrates an example system for modifying labeled content, inaccordance with various embodiments.

FIG. 2 illustrates an example capture device, in accordance with variousembodiments of the present disclosure.

FIG. 3 illustrates seven eye features for dynamically captured eyecontent, in accordance with various embodiments of the presentdisclosure.

FIG. 4 illustrates an example modification workflow, in accordance withvarious embodiments of the present disclosure.

FIG. 5 illustrates example modified target eye content, in accordancewith various embodiments of the present disclosure.

FIG. 6 illustrates an example converted captured content, in accordancewith various embodiments of the present disclosure.

FIG. 7 illustrates example representations, in accordance with variousembodiments of the present disclosure.

FIG. 8 illustrates one or more example mouth features and cheekfeatures, in accordance with various embodiments of the presentdisclosure.

FIG. 9 illustrates example modified target mouth content, in accordancewith various embodiments of the present disclosure.

FIG. 10 illustrates an example bounding box estimation workflow, inaccordance with various embodiments of the present disclosure.

FIG. 11 is an operational flow diagram illustrating an example processfor modifying labeled content for a capture device, in accordance withone embodiment.

FIG. 12 illustrates a comparison between different eye featurepredictors.

FIG. 13 illustrates a comparison between different eye featurepredictors.

FIG. 14 illustrates a comparison between different mouth featurepredictors.

FIG. 15 illustrates an example representation with dynamically capturedcontent, in accordance with various embodiments of the presentdisclosure.

FIG. 16 illustrates an example representation with dynamically capturedcontent, in accordance with various embodiments of the presentdisclosure.

FIG. 17 illustrates an example representation with dynamically capturedcontent, in accordance with various embodiments of the presentdisclosure.

FIG. 18 illustrates an example representation with dynamically capturedcontent, in accordance with various embodiments of the presentdisclosure.

FIG. 19 illustrates an example representation with dynamically capturedcontent, in accordance with various embodiments of the presentdisclosure.

FIG. 20 illustrates an example representation with dynamically capturedcontent, in accordance with various embodiments of the presentdisclosure.

FIG. 21 illustrates an example computing component that may be used toimplement features of various embodiments of the disclosure.

The figures are described in greater detail in the description andexamples below, are provided for purposes of illustration only, andmerely depict typical or example embodiments of the disclosure. Thefigures are not intended to be exhaustive or to limit the disclosure tothe precise form disclosed. It should also be understood that thedisclosure may be practiced with modification or alteration, and thatthe disclosure may be limited only by the claims and the equivalentsthereof.

DETAILED DESCRIPTION

Acquiring manually labeled training content for a specific applicationcan be very expensive and while such content may be available for casualcamera imagery, this content may not be a good fit for other capturedevices. Various embodiments of the present disclosure are directed to amethod or system for modifying labeled content. Content, which may bereferred to herein as target content, may include different resolutionimages (e.g., standard, high definition (HD), ultra HD (UHD), 4k UHD, 8kUHD, and/or other resolutions). The target content may include one ormore images of one or more features that have been labeled. For example,the features may be one or more facial features of one or more people.The target content may be labeled, such that one or more facial featuresare pre-identified in the target content. In embodiments, the labeledtarget content may include a set of landmark labeled, or facial featurelabeled, casual photography of one or more people's faces withdifferent, arbitrary poses from one or more regular camera lenses withsignificantly different intrinsics (e.g., lens types, f-stop values, ISOvalues, etc.). For example, a labeled target image may includepre-labeled facial features identifying different parts of the face, aswill be described in greater detail herein. It should be appreciatedthat other labeled features may be used in modifying labeled targetcontent (e.g., skeletal system, arms, hands, fingers, legs, feet, toes,backs, fronts, etc.)

Modifying labeled target content may include modifying labeled targetcontent. Modifying the labeled target content may include breaking updifferent facial features (e.g., eye features, mouth features, cheekfeatures, etc.) Labeled target content may have pre-identified facialfeatures. The one or more facial features may be rotated, cropped,and/or warped such that the modified target content matches thedistortion, size, reflectivity, etc. of the dynamically capturedcontent. In embodiments, the modified target content may be used astraining content to generate the one or more facial features in thedynamically captured content that has not been pre-labeled with the oneor more facial features. In some embodiments, the one or more generatedfacial features may be recorded and stored to generate new sets oflabeled target content. The one or more generated facial features may beused in generating a representation of a face, such as, for example, ananimated avatar. The representation may be displayed on a graphical userinterface.

Before describing the technology in detail, it may be useful to describean example environment in which the presently disclosed technology canbe implemented. FIG. 1 illustrates one such example environment 100.

Environment 100 may be used in connection with implementing embodimentsof the disclosed systems, methods, and devices. By way of example, thevarious below-described components of FIG. 1 may be used to modifylabeled target content for a capture device, generate one or more facialfeatures from dynamically captured content using modified targetcontent, and use the one or more facial features to generate arepresentation. Target content and dynamically captured content mayinclude images, video, and/or other content. Dynamically capturedcontent may have been captured using a first capture device. Targetcontent may have been captured using one or more second capture devices.

As illustrated, first capture device may correspond to capture device103. FIG. 2 illustrates an example capture device, in accordance withvarious embodiments of the present disclosure. Capture device 103 mayinclude one or more of Red-Green-Blue (RGB) camera 202, infrared (IR)camera 204, and IR illuminator 206. In embodiments, RGB camera 202 mayinclude a wide angle lens. In some embodiments, capture device mayinclude additional capture devices. In embodiments, capture device 103may include a head-mounted display for use in AR/VR systems. The firstcapture device may be different from the second capture device. Forexample, a distortion created by a lens on the first capture device maybe different than a distortion created by a lens on the second capturedevice. It should be appreciated that other differences may existbetween the first capture device and the second capture device (e.g.,lens type, lens angle, chromatic aberration, f-stop, ISO, sensor,different color formats, etc.)

Target content and dynamically captured content may include one or morefacial features and/or other features. The one or more facial featuresmay include one or more eye features, one or more mouth features, one ormore cheek features, and/or other facial features. The dynamicallycaptured content may include a video stream and/or an image ofindividual eyes, a mouth, cheeks, and/or other parts of the face. Inembodiments, the dynamically captured content may include the whole facewith a single image and/or video. Referring back to FIG. 1, serversystem 106 may include facial feature component 114, eye featurecomponent 116, and mouth feature component 118, as will be describedherein. Facial feature component 114 may be used with respect to the oneor more facial features in the target content and the dynamicallycaptured content, eye feature component 116 may be used with respect tothe one or more eye features in the target content and the dynamicallycaptured content, and mouth feature component 118 may be used withrespect to the one or more mouth features in the target content and thedynamically captured content.

Facial feature component 114 may be used with respect to the one or morefacial features. Facial feature component 114 may include eye featurecomponent 116 and mouth feature component 118. Facial feature component114 may modify labeled target content to match the dynamically capturedcontent from the first capture device.

In embodiments, facial feature component 114 may generate the one ormore facial features for the dynamically captured content. Usingbounding boxes for individual ones of the eyes, mouth, cheeks, and/orother parts of the face and/or the modified training content as input, afacial feature prediction algorithm may be used to generate the one ormore facial features for the dynamically captured content. The facialfeature prediction algorithm may include an eye feature predictionalgorithm, a mouth feature prediction algorithm, a cheek featureprediction algorithm, and/or other facial feature prediction algorithms.For example, a shape predictor may use a learning rate of about 0.1, atree depth of about 4, a cascade depth of about 10, and a number oftrees per cascade level of about 500.

In one example, the dynamically captured content may include multipleframes. An eye may be open in a first frame and closed in a secondframe. Facial feature component 114 may identify an open eye in thefirst frame and a closed eye in the second frame. In some embodiments,the one or more identified dynamically captured content may be stored tobe used as training content.

In embodiments, facial feature component 114 may be used to dynamicallygenerate a representation of a face based on dynamically capturedcontent. For example, the representation may open its mouth or blink itseyes corresponding to the person in the dynamically captured contentopening her mouth or blinking her eyes, respectively. In embodiments,the representation may be calibrated by measuring distances betweenindividual ones of the one or more facial features in a facial position.

The representation may use visual effects to depict the one or morefacial features. In embodiments, a visual effect may include one or morevisual transformations of the representation and/or graphics. A visualtransformation may include one or more visual changes in how therepresentation is presented or displayed. In some embodiments, a visualtransformation may include one or more of a color gradient, a visualzoom, a visual filter, a visual rotation, and/or a visual overlay (e.g.,text and/or graphics overlay). The visual effects may illustrate anavatar that simulates the facial movement of a person that is beingcaptured by capture device 103.

Eye feature component 116 may be used with respect to the one or moreeye features. The one or more eye features may include one or morelandmarks identifying a part of an eye. FIG. 3 illustrates eye featuresand pupil estimation 302, or landmarks, for dynamically captured eyecontent 300, in accordance with various embodiments of the presentdisclosure. It should be appreciated that more or fewer eye features maybe used. As illustrated, the eye features may identify the ends of theeye, the middle of the eye, and other parts of the eye. The labeledtarget content may include one or more labels for the one or more eyefeatures.

Referring back to FIG. 1, eye feature component 116 may modify thelabeled target content to match the dynamically captured content fromthe first capture device. Modifying the labeled target content for theone or more eye features may include rotating the labeled target contentbased on an angle of the dynamically captured content to generaterotated target content. For example, FIG. 4 illustrates examplemodification workflow 400, in accordance with various embodiments of thepresent disclosure. 402 illustrates an example of labeled targetcontent. 404 illustrates rotating the labeled target content about 0.8degrees from the x-axis with a Euler angle rotation. The about 0.8degree rotation may be used because reference image 412 rotates the eyeabout 0.8 degrees. For different capture devices, the amount of rotationmay be different.

Modifying the labeled target content for the one or more eye featuresmay include cropping the rotated target content such that the one ormore eye features are in cropped target content. For example, referringback to FIG. 4, 406 illustrates an example of cropping rotated targetcontent. The cropping may take the smallest rectangular part of therotated target content that includes all of the one or more eye featuresin the rotated target content. In embodiments, the rectangular part thatmay be used to crop multiple rotated target content may use a singleaspect ratio (e.g., all cropped images have a 3:2 aspect ratio). In someembodiments, different shapes may be used to crop the rotated targetcontent (e.g., circular, oval, triangular, diamond, etc.). Inembodiments, different shapes and/or aspect ratios may be used forindividual ones of the rotated target content to match the dynamicallycaptured content. In embodiments, cropping may include scaling thecropped image to match a size of the dynamically captured content.

Modifying the labeled target content for the one or more eye featuresmay include warping the cropped target content to match the dynamicallycaptured content corresponding to the one or more eye features. Forexample, referring back to FIG. 4, 408 illustrates an example of warpingcropped target content. The warping may be, for example, sphericalwarping, such that the center of the cropped target content may bemagnified more and the edges of the cropped target content may bemagnified less. In this example, spherical warping may be used becauseit approximates an effect of the lens in the dynamically capturedcontent better than other warping effects. A center of the warping maybe applied to the center of the average position of the one or more eyefeatures for an individual eye. It should be appreciated that fordifferent capture devices, the warping may be different. In embodiments,warping may include inpainting any empty regions in the warped targetcontent that may have been caused by the warping or otherwise, asillustrated in 410.

The modification to the labeled target content may be applied to all ofthe labeled target content to generate modified target content.Referring back to FIG. 1, eye feature component 116 may store modifiedtarget content which can be used as training content for dynamicallycaptured content from the first capture device. For example, FIG. 5illustrates example modified target eye content, in accordance withvarious embodiments of the present disclosure.

Referring back to FIG. 1, eye feature component 116 may generate the oneor more eye features in the dynamically captured content using themodified target content. Generating the one or more eye features in thedynamically captured content using the modified target content mayinclude using the dynamically captured content corresponding to the eye(e.g., the capture device that captures one or more of the eyes). Usingthe modified target content corresponding to the eye as trainingcontent, an estimate of a position for the one or more eye features canbe generated. Using the entire dynamically captured contentcorresponding to the eye as a bounding box for the eye as input, the eyefeature prediction algorithm may be used to generate the one or more eyefeatures for the dynamically captured content. The eye featureprediction algorithm may estimate the one or more eye features based onan ensemble of regression trees. The ensemble may have a learning rateof about 0.1, a tree depth of about 4, a cascade depth of about 10, anda number of trees per cascade level of about 500. The eye featureprediction algorithm may be used per eye.

In embodiments, using the modified target content, an eye templatecomprising an average range of the one or more eye features in themodified training content may be generated. Applying the eye template tothe dynamically captured content corresponding to the eye may be used togenerate the one or more eye features for the dynamically capturedcontent. In embodiments, parameters of the eye template may be narrowed(i.e., narrowing the range of the one or more eye features) to improvethe position of the one or more eye features on the dynamically capturedcontent. This calibration, which may include generating the one or moreeye features and/or improving the position of the one or more eyefeatures, may occur every frame of a video, every 30 seconds, everyminute, once per session of the capture device, and/or other periods oftime.

In embodiments, generating the one or more eye features in thedynamically captured content using the modified target content mayinclude estimating a bounding box around the eye in the dynamicallycaptured content using one or more eye parameters to generate the one ormore eye features. In some embodiments, the one or more eye parametersmay include a position on the face (e.g., an upper half of a face), ashape of an eye (e.g., two curves meeting at the end, diamond-like,circular, oval, etc.), a color change (e.g., skin tone to whites of theeyes), a texture change (e.g., reflectivity, roughness, smoothness,etc.), and/or other parameters. In embodiments, the one or more eyeparameters may be used to estimate a bounding box. In some embodiments,the eye template may be used as a filter, by scanning the eye templateacross multiple positions of the dynamically captured content toidentify shapes that are within a threshold value of the eye templateshape. Using a center of identified eye shapes, a bounding box may begenerated and placed around the center of the identified eye shapes tolimit the application of the eye template to the bounding box. The eyetemplate may be used on identified eye shapes in the dynamicallycaptured content to generate the one or more eye features for thedynamically captured content, as described above.

Eye feature component 116 may generate a position for the one or moreeye features in the dynamically captured content. For example, a firsteye feature may be in a first position in a first frame, in a secondposition in a second frame, in a third position in a third frame, and ina first position in a fourth frame. This may correspond to squinting,blinking, widening, and/or other movements of the eye.

In embodiments, eye feature component 116 may identify a closed eyeusing converted captured content. Converted captured content may includecontent that has been processed in a different color format, or whichhas been otherwise derived from the dynamically captured content. Forexample, the dynamically captured content may be captured using a RGBcamera, an IR camera, and/or an IR illuminator. The RGB camera maycapture RGB content. Converted captured content may be RGB contentconverted into Hue-Saturation-Value (HSV) content. The convertedcaptured content (e.g., HSV content) may be compared to a thresholdrange for one or more of the values (e.g., H, S, and/or V). For example,an eye may be determined to be closed when the threshold range isbetween about 220 and about 225 for the V channel in HSV. In someembodiments, the IR illuminator and IR camera may be used to identify anumber of reflection pixels from the eye. A reflection pixel may be whenthe IR camera receives an IR light ray reflected off the eye from the IRilluminator. For example, when the number of reflection pixels is lessthan 10, the eye may be determined to be closed. In embodiments,multiple thresholds (e.g., the eye color threshold range and/or the IReye threshold) may be used to confirm and cross-check other signs forwhether an eye is closed. It should be appreciated that the labeledtarget data may be converted into any other color format using differentsource color format and/or target color formats.

FIG. 6 illustrates an example converted captured content, in accordancewith various embodiments of the present disclosure. 602 illustrates anexample of dynamically captured content. 604 illustrates a convertedimage of an eye using an IR camera to capture reflections from an IRilluminator. 606 illustrates a converted image of a pupil in the HSVchannel after a threshold has been applied to the converted image.

In some embodiments, generating a position for the one or more eyefeatures in the dynamically captured content may include generating aposition for a pupil when a part of the given image in the convertedcaptured content is within a second threshold range. For example, theconverted captured content may be RGB content converted into HSVcontent. The converted captured content may be compared to a thresholdrange for one or more of the values (e.g., H, S, and/or V). For example,the pupil may be determined to be in a part of the dynamically capturedcontent that has a value between about 0 and about 10 of the V channel.In some embodiments, a center of the pupil may be estimated based on anaverage center using the bounding box for the eye and/or the one or moreeye features. The one or more generated eye features may be stored to beused as training content.

In embodiments, referring back to FIG. 1, eye feature component 116 maydynamically generate a representation of a face based on dynamicallycaptured content. For example, the representation may blink its eyescorresponding to the person in the dynamically captured content blinkingher eye. Eye feature component 116 may use the one or more generated eyefeatures to generate the representation. For example, since one or moreeye features have been generated, an eye blink, squint, etc. and amovement of a pupil may be captured from the dynamically capturedcontent. This may be processed in server system 106 to generate arepresentation of a face that simulates the one or more eye featurescaptured. The representation may use visual effects to depict the one ormore eye features, as described above.

The representation may be displayed on electronic device 102. FIG. 7illustrates example representations, in accordance with variousembodiments of the present disclosure. 702, 704, and 706 may illustratea more realistic representation of a person, while 708, 710, and 712 maybe more similar to a virtual avatar. 702 may illustrate a neutral faceposition. 704 may illustrate a smiling position. 706 may illustrate aclosed eye position. 710 may illustrate the representation lookingright. 712 may illustrate a pout shape. 714 may illustrate therepresentation with its mouth open.

Referring back to FIG. 1, mouth feature component 118 may be used withrespect to the one or more mouth features. The one or more mouthfeatures may include one or more landmarks identifying a part of amouth. FIG. 8 illustrates one or more example mouth features and one ormore cheek features, in accordance with various embodiments of thepresent disclosure. As illustrated, there may be twenty mouth featuresand seven cheek features. It should be appreciated that more or fewermouth features and/or cheek features may be used. As illustrated, themouth features may identify the an edge of the mouth, middle upper partsof the mouth, and middle lower parts of the mouth. It should beappreciated that the one or more mouth features may identify other partsof the mouth. The labeled target content may include one or more labelsfor the one or more mouth features and the one or more cheek features.

Referring back to FIG. 1, mouth feature component 118 may modify thelabeled target content to match the dynamically captured content fromthe first capture device. Modifying the labeled target content for theone or more mouth features may include cropping the rotated targetcontent such that the one or more mouth features are in cropped targetcontent. The cropping may be similar to the cropping described above. Asdescribed above, cropping may include scaling the cropped image to matchthe size of the dynamically captured content.

Modifying the labeled target content for the one or more mouth featuresmay include warping the cropped target content to match the dynamicallycaptured content corresponding to the one or more mouth features. Thewarping may be similar to the warping described above. As describedabove, warping may include inpainting any empty regions in the warpedtarget content that may have been caused by the warping or otherwise.

The modification to the labeled target content may be applied to all ofthe labeled target content to generate modified target content. Mouthfeature component 118 may store modified target content which can beused as training content for dynamically captured content from the firstcapture device. For example, FIG. 9 illustrates example modified targetmouth content, in accordance with various embodiments of the presentdisclosure.

In embodiments, referring back to FIG. 1, mouth feature component 118may generate the one or more mouth features in the dynamically capturedcontent using the modified target content. Generating the one or moremouth features in the dynamically captured content using the modifiedtarget content may include using the dynamically captured contentcorresponding to the mouth (e.g., the capture device that captures themouth). Using the modified target content corresponding to the mouth astraining content, an estimate of a position for the one or more mouthfeatures can be generated. Using a bounding box for the mouth as input,the mouth feature prediction algorithm may be used to generate the oneor more mouth features for the dynamically captured content. The mouthfeature prediction algorithm may estimate the one or more mouth featuresbased on an ensemble of regression trees. The ensemble may have alearning rate of about 0.1, a tree depth of about 4, a cascade depth ofabout 10, and a number of trees per cascade level of about 500. Themouth feature prediction algorithm may be separate from the eye featureprediction algorithm

In embodiments, generating the one or more mouth features in thedynamically captured content using the modified target content mayinclude estimating a bounding box around the mouth in the dynamicallycaptured content using one or more mouth parameters to generate the oneor more mouth features. In some embodiments, the one or more mouthparameters may include a position on the face (e.g., a lower half of aface), a shape of a mouth (e.g., two curves meeting at the end,circular, oval, etc.), a color change (e.g., skin tone to lips), atexture change (e.g., reflectivity, roughness, smoothness, etc.),segmentation (e.g., color space segmentation, histogram segmentation,chromatism segmentation, etc.), and/or other parameters. In embodiments,the parameters may be used to estimate a bounding box.

FIG. 10 illustrates example bounding box estimation workflow 1000, inaccordance with various embodiments of the present disclosure. 1002illustrates an example input image, or dynamically captured contentcorresponding to the mouth. 1004 illustrates normalizing the dynamicallycaptured content (e.g., RGB normalization). 1006 may illustratehistogram segmentation. Histogram segmentation may include generating ahistogram of the converted captured content and/or comparing thehistogram against an adaptive threshold value to select skin pixels. Forexample, for converted captured content, that may include segmentedcaptured content with pixels p and a histogram with bin value h(i), agiven pixel may be classified as a skin pixel should one or more of thefollowing conditions be met:

${h(x)} > \frac{h_{\max}}{4}$ and $X > \frac{h_{high} + h_{low}}{2}$where H_(max) may represent a maximum bin value, h_(high) may representthe last histogram bin with more than 1 pixel, and h_(low) may representthe first histogram bin with more than 1 pixel.

1008 may illustrate color space segmentation. Color space segmentationmay include converting the normalized captured content into anothercolor space (e.g., HSV color space). The converted captured content maybe filtered by the H channel between about 0 and about 120. The filteredcaptured content may be subtracted from the histogram-segmented contentdescribed above. The contours of the subtracted content may be extractedto identify the mouth. For example, the contours may be extracted byusing a convex hull approximation. It should be appreciated that otheralgorithms may be used to extract the contours.

1010 may illustrate chromatism segmentation. Chromatism segmentation mayinclude filtering based on a threshold value. For example, thechromatism value, s, may be determined based on:

$s = \frac{2{\tan^{- 1}\left( \frac{R - G}{R} \right)}}{\pi}$where R may represent the red values of RGB, and G may represent thegreen values of RGB. An example threshold value to identify the mouthmay be above 0 for s, the chromatism value.

In embodiments, the one or more segmentations may be used together torefine an estimate of the bounding box for the mouth. In someembodiments, dilation and erosion may be performed to remove noise fromthe one or more segmentations. A contour center of the mouth may be usedto identify the mouth. The bounding box for the mouth may be identifiedusing a maximum and a minimum contour point.

In one example, a mouth may be neutral in a first frame, smiling in asecond frame, frowning in a third frame, and neutral in a fourth frame.The position of the one or more mouth features may change in thedynamically captured content. These changes may be identified by mouthfeature component 118. It should be appreciated that finer changes maybe identified than the list of three examples provided above based onthe subtleties of moving different parts of the mouth.

In embodiments, using the modified target content, a mouth templatecomprising an average range of the one or more mouth features in themodified training content may be generated. Applying the mouth templateto the dynamically captured content corresponding to the mouth may beused to generate the one or more mouth features for the dynamicallycaptured content. The mouth template may be similar to the eye templatedescribed above, which allows for finer calibration (e.g., more accurateposition of the one or more mouth features and a higher rate ofre-calibration of the mouth template). In some embodiments, the mouthtemplate may be used as a filter, by scanning the mouth template acrossmultiple positions of the dynamically captured content to identifyshapes that are within a threshold value of the mouth template shape.Using a center of the identified mouth shapes, a bounding box may begenerated and placed around the center of the identified mouth shapes tolimit the application of the mouth template to the bounding box. Themouth template may be used on identified mouth shapes in the dynamicallycaptured content to generate the one or more mouth features for thedynamically captured content, as described above.

In embodiments, mouth feature component 118 may dynamically generate arepresentation of a face based on the dynamically captured content. Insome embodiments, mouth feature component 118 may display therepresentation. This may be similar to the representation described ingreater detail above.

Electronic device 102 may include a variety of electronic computingdevices, such as, for example, a smartphone, tablet, laptop, computer,wearable device, television, virtual reality device, augmented realitydevice, displays, connected home device, Internet of Things (IOT)device, smart speaker, and/or other devices. Electronic device 102 maypresent content and/or representations to a user and/or receive requeststo send content and/or representations to another user. In someembodiments, electronic device 102 may apply facial feature component114, eye feature component 116, and/or mouth feature component 118 tolabeled target content, dynamically captured content, and/orrepresentations. In embodiments, electronic device 102 may storecontent, representations, models, algorithms, and related information offacial feature component 114, eye feature component 116, and/or mouthfeature component 118, as well as facial feature component 114, eyefeature component 116, and/or mouth feature component 118 themselves.

Capture device 103 may include one or more capture devices, such as, forexample, an RGB camera, IR camera, an IR illuminator, a video camera, aphone camera, a digital single lens reflect camera, a film camera, ahead mounted camera, a 360 degree camera, and/or other cameras. Thecameras may capture content at various resolutions, such as thosedescribed above. The cameras may use one or more lenses, such as, forexample, a fisheye lens, wide-angle lenses, a prime lens, a zoom lens, atelephoto lens, and/or other lenses that cause different types ofdistortion to the dynamically captured content. For example, a fisheyelens may cause more spherical distortion to the dynamically capturedcontent than a prime lens. The lenses may capture different degrees ofan area, which can also affect distortion of the dynamically capturedcontent. For example, a 360 degree camera may create lens distortions.In embodiments, the cameras may be mounted, or otherwise coupled, torobots and/or autonomous vehicles.

As shown in FIG. 1, environment 100 may include one or more ofelectronic device 102, capture device 103, and server system 106.Electronic device 102 and/or capture device 103 can be coupled to serversystem 106 via communication media 104. As will be described in detailherein, electronic device 102, capture device 103, and/or server system106 may exchange communications signals, including content (e.g.,labeled target content, dynamically captured content, converted content,normalized content, segmented content, etc.), representations, metadata,algorithms, modifications (e.g., rotating, cropping, warping, etc.),user input, bounding boxes, and/or other information via communicationmedia 104.

In various embodiments, communication media 104 may be based on one ormore wireless communication protocols such as Wi-Fi, Bluetooth®, ZigBee,802.11 protocols, IR, Radio Frequency (RF), 2G, 3G, 4G, 5G, etc., and/orwired protocols and media. Communication media 104 may be implemented asa single medium in some cases.

As mentioned above, communication media 104 may be used to connect orcommunicatively couple electronic device 102, capture device 103, and/orserver system 106 to one another or to a network, and communicationmedia 104 may be implemented in a variety of forms. For example,communication media 104 may include an Internet connection, such as alocal area network (LAN), a wide area network (WAN), a fiber opticnetwork, internet over power lines, a hard-wired connection (e.g., abus), and the like, or any other kind of network connection.Communication media 104 may be implemented using any combination ofrouters, cables, modems, switches, fiber optics, wires, radio (e.g.,microwave/RF links), and the like. Upon reading the present disclosure,it should be appreciated that other ways may be used to implementcommunication media 104 for communications purposes.

Likewise, it will be appreciated that a similar communication medium maybe used to connect or communicatively couple server 108, storage 110,processor 112, facial feature component 114, eye feature component 116,and/or mouth feature component 118 to one another in addition to otherelements of environment 100. In example implementations, communicationmedia 104 may be or include a wired or wireless wide area network (e.g.,cellular, fiber, and/or circuit-switched connection, etc.) forelectronic device 102, capture device 103, and/or server system 106,which may be relatively geographically disparate; and in some cases,aspects of communication media 104 may involve a wired or wireless localarea network (e.g., Wi-Fi, Bluetooth, unlicensed wireless connection,USB, HDMI, standard AV, etc.), which may be used to communicativelycouple aspects of environment 100 that may be relatively closegeographically.

Server system 106 may provide, receive, collect, or monitor informationto/from electronic device 102 and/or capture device 103, such as, forexample, content (e.g., labeled target content, dynamically capturedcontent, converted content, normalized content, segmented content,etc.), representations, metadata, algorithms, modifications (e.g.,rotating, cropping, warping, etc.), user input, bounding boxes, and thelike. Server system 106 may be configured to receive or send suchinformation via communication media 104. This information may be storedin storage 110 and may be processed using processor 112. For example,processor 112 may include an analytics engine capable of performinganalytics on information that server system 106 has collected, received,etc. from electronic device 102 and/or capture device 103. Processor 112may include facial feature component 114, eye feature component 116,and/or mouth feature component 118 capable of receiving labeled targetcontent, dynamically captured content, and/or representations, analyzinglabeled target content, dynamically captured content, and/orrepresentations, and otherwise processing content, dynamically capturedcontent, and/or representations and generating information, content,and/or representations that server system 106 has collected, received,etc. based on requests from, or coming from, electronic device 102and/or capture device 103. In embodiments, server 108, storage 110, andprocessor 112 may be implemented as a distributed computing network, arelational database, or the like.

Server 108 may include, for example, an Internet server, a router, adesktop or laptop computer, a smartphone, a tablet, a processor, acomponent, or the like, and may be implemented in various forms,including, for example, in an integrated circuit or collection thereof,in a printed circuit board or collection thereof, or in a discretehousing/package/rack or multiple of the same. Server 108 may updateinformation stored on electronic device 102 and/or capture device 103.Server 108 may send/receive information to/from electronic device 102and/or capture device 103 in real-time or sporadically. Further, server108 may implement cloud computing capabilities for electronic device 102and/or capture device 103. Upon studying the present disclosure, one ofskill in the art will appreciate that environment 100 may includemultiple electronic devices 102, capture devices 103, communicationmedia 104, server systems 106, servers 108, storage 110, processors 112,facial feature components 114, eye feature components 116, and/or mouthfeature components 118.

FIG. 11 is an operational flow diagram illustrating an example processfor modifying labeled content for a capture device, in accordance withone embodiment. The operations of the various methods described hereinare not necessarily limited to the order described or shown in thefigures, and it should be appreciated, upon studying the presentdisclosure, that variations of the order of the operations describedherein are within the spirit and scope of the disclosure.

The operations and sub-operations of the flow diagram may be carriedout, in some cases, by one or more of the components, elements, devices,components, and circuitry of system 100. This may include one or moreof: server system 106; server 108; processor 112; storage 110; and/orcomputing component 2100, described herein and referenced with respectto at least FIGS. 1 and 21, as well as subcomponents, elements, devices,components, and circuitry depicted therein and/or described with respectthereto. In such instances, the description of the flow diagram mayrefer to a corresponding component, element, etc., but regardless ofwhether an explicit reference is made, it will be appreciated, uponstudying the present disclosure, when the corresponding component,element, etc. may be used. Further, it will be appreciated that suchreferences do not necessarily limit the described methods to theparticular component, element, etc. referred to. Thus, it will beappreciated that aspects and features described above in connection with(sub-) components, elements, devices, circuitry, etc., includingvariations thereof, may be applied to the various operations describedin connection with the flow diagram without departing from the scope ofthe present disclosure.

At 1102, labeled target content may be obtained. The labeled targetcontent may include an image, a video, and/or other media content. Thelabeled target content may include features of one or more people, suchas, for example, facial features. The labeled target content may includeone or more pre-labeled facial features, eye features, mouth features,cheek features, etc.

At 1104, the labeled target content may be modified to generate modifiedtarget content. As described above, the labeled target content may berotated, cropped, warped, etc. to match the corresponding parts of theface in the dynamically captured content.

At 1106, the modified target content may be stored.

At 1108, the one or more facial features in the dynamically capturedcontent may be generated. The first capture device used to captured thedynamically captured content may be different (e.g., differentdistortions on the content) from the second capture device(s) used tocapture the labeled target content. In embodiments, the capture deviceused to capture the dynamically captured content may include a RGBcamera, an IR camera, and an IR illuminator. Using the modified targetcontent as training content, the one or more facial features, etc. canbe generated by estimating a bounding box for different parts of theface, converting the dynamically captured content, normalizing thedynamically captured content, segmenting the dynamically capturedcontent, etc. The generation may occur in real-time. In someembodiments, the generation may occur at the beginning of a session tocalibrate the first capture device. The one or more facial features maychange by frame, which may include a change in position of the one ormore facial features, etc., whether an eye is open or closed, a positionof a pupil, or other changes, as described above.

At 1110, a representation of a face may be dynamically generated. Therepresentation may change how the eyes, mouth, and face movecorresponding to the dynamically captured content and/or the one or moregenerated facial features, as described above.

At 1112, the representation may be displayed.

FIG. 12 illustrates a comparison between different eye featurepredictors. 1202 illustrates the eye predictor predicting six eyefeatures using the target content. 1204 illustrates the eye predictorpredicting six eye features using the modified target content. 1204 moreaccurately identifies the eye features than 1202.

FIG. 13 illustrates a comparison between different eyes. 1302illustrates a first eye with seven eye features. The eye features mayinclude pupil feature 1303. 1304 illustrates a second eye with seven eyefeatures. 1306 illustrates a third eye with seven eye features.

FIG. 14 illustrates a comparison between different mouth featurepredictors. 1402 illustrates the mouth predictor predicting twenty mouthfeatures using the target content. 1404 illustrates the eye predictorpredicting twenty mouth features using the modified target content. 1404more accurately identifies the mouth features than 1402.

FIGS. 15-20 illustrate example representations with dynamically capturedcontent, in accordance with various embodiments of the presentdisclosure. The representations displayed may be simulating the person'sactions captured in the dynamically captured content. The dynamicallycaptured content for individual eyes and the mouth may be overlaid inthe virtual environment of the representation. Six eye features per eyeare used and twenty mouth features are used. The bounding box isillustrated for the mouth as well.

As used herein, the term component might describe a given unit offunctionality that can be performed in accordance with one or moreembodiments of the technology disclosed herein. As used herein, acomponent might be implemented utilizing any form of hardware, software,or a combination thereof. For example, one or more processors,controllers, ASICs, PLAs, PALs, CPLDs, FPGAs, logical components,software routines or other mechanisms might be implemented to make up acomponent. In implementation, the various components described hereinmight be implemented as discrete components or the functions andfeatures described can be shared in part or in total among one or morecomponents. In other words, as would be apparent to one of ordinaryskill in the art after reading this description, the various featuresand functionality described herein may be implemented in any givenapplication and can be implemented in one or more separate or sharedcomponents in various combinations and permutations. As used herein, theterm engine may describe a collection of components configured toperform one or more specific tasks. Even though various features orelements of functionality may be individually described or claimed asseparate components or engines, one of ordinary skill in the art willunderstand that these features and functionality can be shared among oneor more common software and hardware elements, and such descriptionshall not require or imply that separate hardware or software componentsare used to implement such features or functionality.

Where engines and/or components of the technology are implemented inwhole or in part using software, in one embodiment, these softwareelements can be implemented to operate with a computing or processingcomponent capable of carrying out the functionality described withrespect thereto. One such example computing component is shown in FIG.21. Various embodiments are described in terms of this example-computingcomponent 2100. After reading this description, it will become apparentto a person skilled in the relevant art how to implement the technologyusing other computing components or architectures.

Referring now to FIG. 21, computing component 2100 may represent, forexample, computing or processing capabilities found within desktop,laptop, and notebook computers; hand-held computing devices (PDA's,smart phones, cell phones, palmtops, etc.); mainframes, supercomputers,workstations, or servers; or any other type of special-purpose orgeneral-purpose computing devices as may be desirable or appropriate fora given application or environment. Computing component 2100 might alsorepresent computing capabilities embedded within or otherwise availableto a given device. For example, a computing component might be found inother electronic devices such as, for example, digital cameras,navigation systems, cellular telephones, portable computing devices,modems, routers, WAPs, terminals and other electronic devices that mightinclude some form of processing capability.

Computing component 2100 might include, for example, one or moreprocessors, controllers, control components, or other processingdevices, such as a processor 2104. Processor 2104 might be implementedusing a general-purpose or special-purpose processing engine such as,for example, a physical computer processor, microprocessor, controller,or other control logic. In the illustrated example, processor 2104 isconnected to a bus 2102, although any communication medium can be usedto facilitate interaction with other components of computing component2100 or to communicate externally.

Computing component 2100 might also include one or more memorycomponents, simply referred to herein as main memory 2108. For example,preferably random access memory (RAM) or other dynamic memory might beused for storing information and instructions to be executed byprocessor 2104. Main memory 2108 might also be used for storingtemporary variables or other intermediate information during executionof instructions to be executed by processor 2104. Computing component2100 might likewise include a read-only memory (“ROM”) or other staticstorage device coupled to bus 2102 for storing static information andinstructions for processor 2104.

The computing component 2100 might also include one or more variousforms of information storage device 2110, which might include, forexample, a media drive 2112 and a storage unit interface 2120. The mediadrive 2112 might include a drive or other mechanism to support fixed orremovable storage media 2114. For example, a hard disk drive, a floppydisk drive, a magnetic tape drive, an optical disk drive, a CD or DVDdrive (R or RW), or other removable or fixed media drive might beprovided. Accordingly, storage media 2114 might include, for example,non-transient electronic storage, a hard disk, a floppy disk, magnetictape, cartridge, optical disk, a CD or DVD, or other fixed or removablemedium that is read by, written to, or accessed by media drive 2112. Asthese examples illustrate, the storage media 2114 can include a computerusable storage medium having stored therein computer software or data.

In alternative embodiments, information storage mechanism 2110 mightinclude other similar instrumentalities for allowing computer programsor other instructions or data to be loaded into computing component2100. Such instrumentalities might include, for example, a fixed orremovable storage unit 2122 and an interface 2120. Examples of suchstorage units 2122 and interfaces 2120 can include a program cartridgeand cartridge interface, a removable memory (for example, a flash memoryor other removable memory component) and memory slot, a PCMCIA slot andcard, and other fixed or removable storage units 2122 and interfaces2120 that allow software and data to be transferred from the storageunit 2122 to computing component 2100.

Computing component 2100 might also include a communications interface2124. Communications interface 2124 might be used to allow software anddata to be transferred between computing component 2100 and externaldevices. Examples of communications interface 2124 might include a modemor softmodem, a network interface (such as an Ethernet, networkinterface card, WiMedia, IEEE 802.XX, or other interface), acommunications port (such as for example, a USB port, IR port, RS232port, Bluetooth® interface, or other port), or other communicationsinterface. Software and data transferred via communications interface2124 might typically be carried on signals, which can be electronic,electromagnetic (which includes optical), or other signals capable ofbeing exchanged by a given communications interface 2124. These signalsmight be provided to communications interface 2124 via channel 2128.This channel 2128 might carry signals and might be implemented using awired or wireless communication medium. Some examples of a channel mightinclude a phone line, a cellular link, an RF link, an optical link, anetwork interface, a local or wide area network, and other wired orwireless communications channels.

In this document, the terms “computer program medium” and “computerusable medium” are used to generally refer to media such as, forexample, memory 2108, storage unit 2120, media 2114, and channel 2128.These and other various forms of computer program media or computerusable media may be involved in carrying one or more sequences of one ormore instructions to a processing device for execution. Suchinstructions embodied on the medium are generally referred to as“computer program code” or a “computer program product” (which may begrouped in the form of computer programs or other groupings). Whenexecuted, such instructions might enable the computing component 2100 toperform features or functions of the disclosed technology as discussedherein.

While various embodiments of the disclosed technology have beendescribed above, it should be understood that they have been presentedby way of example only, and not of limitation. Likewise, the variousdiagrams may depict an example architectural or other configuration forthe disclosed technology, which is done to aid in understanding thefeatures and functionality that can be included in the disclosedtechnology. The disclosed technology is not restricted to theillustrated example architectures or configurations, but the desiredfeatures can be implemented using a variety of alternative architecturesand configurations. Indeed, it will be apparent to one of skill in theart how alternative functional, logical or physical partitioning, andconfigurations can be implemented to implement the desired features ofthe technology disclosed herein. Also, a multitude of differentconstituent component names other than those depicted herein can beapplied to the various partitions. Additionally, with regard to flowdiagrams, operational descriptions, and method claims, the order inwhich the steps are presented herein shall not mandate that variousembodiments be implemented to perform the recited functionality in thesame order unless the context dictates otherwise.

Although the disclosed technology is described above in terms of variousexemplary embodiments and implementations, it should be understood thatthe various features, aspects, and functionality described in one ormore of the individual embodiments are not limited in theirapplicability to the particular embodiment with which they aredescribed, but instead can be applied, alone or in various combinations,to one or more of the other embodiments of the disclosed technology,whether or not such embodiments are described and whether or not suchfeatures are presented as being a part of a described embodiment. Thus,the breadth and scope of the technology disclosed herein should not belimited by any of the above-described exemplary embodiments.

Terms and phrases used in this document, and variations thereof, unlessotherwise expressly stated, should be construed as open ended as opposedto limiting. As examples of the foregoing: the term “including” shouldbe read as meaning “including, without limitation” or the like; the term“example” is used to provide exemplary instances of the item indiscussion, not an exhaustive or limiting list thereof; the terms “a” or“an” should be read as meaning “at least one,” “one or more” or thelike; and adjectives such as “conventional,” “traditional,” “normal,”“standard,” “known,” and terms of similar meaning should not beconstrued as limiting the item described to a given time period or to anitem available as of a given time, but instead should be read toencompass conventional, traditional, normal, or standard technologiesthat may be available or known now or at any time in the future.Likewise, where this document refers to technologies that would beapparent or known to one of ordinary skill in the art, such technologiesencompass those apparent or known to the skilled artisan now or at anytime in the future.

The presence of broadening words and phrases such as “one or more,” “atleast,” “but not limited to,” or other like phrases in some instancesshall not be read to mean that the narrower case is intended or requiredin instances where such broadening phrases may be absent. The use of theterm “component” does not imply that the components or functionalitydescribed or claimed as part of the component are all configured in acommon package. Indeed, any or all of the various components of acomponent, whether control logic or other components, can be combined ina single package or separately maintained and can further be distributedin multiple groupings or packages or across multiple locations.

Additionally, the various embodiments set forth herein are described interms of exemplary block diagrams, flow charts, and other illustrations.As will become apparent to one of ordinary skill in the art afterreading this document, the illustrated embodiments and their variousalternatives can be implemented without confinement to the illustratedexamples. For example, block diagrams and their accompanying descriptionshould not be construed as mandating a particular architecture orconfiguration.

What is claimed is:
 1. A computer-implemented method for modifyinglabeled target content, the method being implemented in a computersystem that comprises a storage, one or more physical computerprocessors, and a graphical user interface, the method comprising:obtaining, from the storage, labeled target content from a first capturedevice, the labeled target content comprising one or more first facialfeatures that have been labeled; modifying, with the one or morephysical computer processors, the labeled target content to matchdynamically captured content from a second capture device to generatemodified target content, wherein the second capture device is differentfrom the first capture device; estimating, with the one or more physicalcomputer processors, one or more bounding boxes in the dynamicallycaptured content based on the modified target content; and generatingone or more second facial features for the dynamically captured contentbased on the one or more bounding boxes.
 2. The computer-implementedmethod of claim 1, wherein the one or more first facial featurescomprise one or more mouth features.
 3. The computer-implemented methodof claim 2, wherein modifying the labeled target content comprises:cropping, with the one or more physical computer processors, the labeledtarget content such that the one or more mouth features are in croppedtarget content; and warping, with the one or more physical computerprocessors, the cropped target content to match one or more parts of thedynamically captured content that correspond to the one or more mouthfeatures.
 4. The computer-implemented method of claim 1, wherein the oneor more first facial features comprises one or more eye features.
 5. Thecomputer-implemented method of claim 4, wherein modifying the labeledtarget content comprises: rotating, with the one or more physicalcomputer processors, the labeled target content based on an angle of thedynamically captured content to generate rotated target content;cropping, with the one or more physical computer processors, the rotatedtarget content such that the one or more eye features are in croppedtarget content; and warping, with the one or more physical computerprocessors, the cropped target content to match one or more parts of thedynamically captured content that correspond to the one or more eyefeatures.
 6. The computer-implemented method of claim 1, wherein the oneor more first facial features comprise one or more eye features, andwherein generating the one or more second facial features comprises:estimating, with the one or more physical computer processors, the oneor more bounding boxes around parts of a face in the dynamicallycaptured content using one or more facial parameters in the dynamicallycaptured content to generate the one or more second facial features fromthe dynamically captured content, wherein the one or more facialparameters comprise one or more of a color, a curve, or a reflectedlight intensity; generating, with the one or more physical computerprocessors, the one or more second facial features in the dynamicallycaptured content using the one or more bounding boxes; identifying, withthe one or more physical computer processors, a closed eye when a givenimage in converted captured content is within a first threshold range,wherein the converted captured content is derived from the dynamicallycaptured content; and generating, with the one or more physical computerprocessors, a position of a pupil when a portion of the given image inthe converted captured content is within a second threshold range. 7.The computer-implemented method of claim 1, further comprising:dynamically generating, with the one or more physical computerprocessors, a representation of a face using visual effects to depictthe one or more second facial features; and displaying, via thegraphical user interface, the representation.
 8. Thecomputer-implemented method of claim 1, wherein modifying the labeledtarget content comprises converting, with the one or more physicalcomputer processors, the labeled target content into converted content,wherein the converted content uses a different color format than thelabeled target content.
 9. The computer-implemented method of claim 1,wherein the second capture device comprises a red-green-blue camera, aninfrared camera, an infrared illuminator, or a combination thereof. 10.The computer-implemented method of claim 1, further comprising:dynamically generating, with the one or more physical computerprocessors, a representation of a face using visual effects to depictthe one or more second facial features or changes to the one or moresecond facial features.
 11. The computer-implemented method of claim 10,wherein the one or more first facial features comprise one or more mouthfeatures, and wherein modifying the labeled target content comprises:cropping, with the one or more physical computer processors, the labeledtarget content such that the one or more mouth features are in croppedtarget content; and warping, with the one or more physical computerprocessors, the cropped target content to match one or more parts of thedynamically captured content that correspond to the one or more mouthfeatures.
 12. The computer-implemented method of claim 10, wherein theone or more first facial features comprise one or more eye features, andwherein modifying the labeled target content comprises: rotating, withthe one or more physical computer processors, the labeled target contentbased on an angle of the dynamically captured content to generaterotated target content; cropping, with the one or more physical computerprocessors, the rotated target content such that the one or more eyefeatures are in cropped target content; and warping, with the one ormore physical computer processors, the cropped target content to matchone or more parts of the dynamically captured content that correspond tothe one or more eye features.
 13. The computer-implemented method ofclaim 10, wherein the one or more first facial features comprise one ormore eye features, and wherein generating the one or more second facialfeatures comprises: estimating, with the one or more physical computerprocessors, the one or more bounding boxes around parts of a face in thedynamically captured content using one or more facial parameters in thedynamically captured content, wherein the one or more facial parameterscomprise one or more of a color, a curve, or a reflected lightintensity; generating, with the one or more physical computerprocessors, the one or more second facial features in the dynamicallycaptured content using the one or more bounding boxes; identifying, withthe one or more physical computer processors, a closed eye when a givenimage in converted captured content is within a first threshold range,wherein the converted captured content is derived from the dynamicallycaptured content; and generating, with the one or more physical computerprocessors, a position of a pupil when a portion of the given image inthe converted captured content is within a second threshold range. 14.The computer-implemented method of claim 10, wherein the second capturedevice comprises a head mounted display, and wherein the head mounteddisplay comprises a red-green-blue camera, an infrared camera, aninfrared illuminator, or a combination thereof.
 15. Acomputer-implemented method for modifying labeled target content for acapture device, the method being implemented in a computer system thatcomprises a storage, one or more physical computer processors, and agraphical user interface, the method comprising: obtaining, from thestorage, labeled target content, the labeled target content comprisingone or more first facial features that have been labeled; modifying,with the one or more physical computer processors, the labeled targetcontent to match dynamically captured content from a first capturedevice to generate modified target content; estimating, with the one ormore physical computer processors, one or more bounding boxes in thedynamically captured content based on the modified target content; andgenerating one or more second facial features for the dynamicallycaptured content based on the one or more bounding boxes, wherein thedynamically captured content includes a first distortion associated withthe first capture device and the labeled target content includes asecond distortion associated with a second capture device used tocapture the labeled target content.
 16. A system to modify labeledtarget content, the system comprising: a storage; a graphical userinterface; and one or more physical computer processors configured bymachine-readable instructions to: obtain, from the storage, labeledtarget content from a first capture device, the labeled target contentcomprising one or more first facial features that have been labeled;modify, with the one or more physical computer processors, the labeledtarget content to match dynamically captured content from a secondcapture device to generate modified target content, wherein the secondcapture device is different from the first capture device; estimate oneor more bounding boxes in the dynamically captured content based on themodified target content; and generate one or more second facial featuresfor the dynamically captured content based on the one or more boundingboxes.
 17. The system of claim 16, wherein the one or more first facialfeatures comprise one or more mouth features, and wherein modifying thelabeled target content comprises: cropping, with the one or morephysical computer processors, the labeled target content such that theone or more mouth features are in cropped target content; and warping,with the one or more physical computer processors, the cropped targetcontent to match one or more parts of the dynamically captured contentthat correspond to the one or more mouth features.
 18. The system ofclaim 16, wherein the one or more first facial features comprise one ormore eye features, and wherein modifying the labeled target contentcomprises: rotating, with the one or more physical computer processors,the labeled target content based on an angle of the dynamically capturedcontent to generate rotated target content; cropping, with the one ormore physical computer processors, the rotated target content such thatthe one or more eye features are in cropped target content; and warping,with the one or more physical computer processors, the cropped targetcontent to match one or more parts of the dynamically captured contentthat correspond to the one or more eye features.
 19. The system of claim16, wherein the one or more first facial features comprise one or moreeye features, and wherein the one or more physical computer processorsare further configured by the machine-readable instructions to: estimatethe one or more bounding boxes around parts of a face in the dynamicallycaptured content using one or more facial parameters in the dynamicallycaptured content, wherein the one or more facial parameters comprise oneor more of a color, a curve, or a reflected light intensity; identify aclosed eye when a given image in converted captured content is within afirst threshold range, wherein the converted captured content is derivedfrom the dynamically captured content; and generate a position of apupil when a portion of the given image in the converted capturedcontent is within a second threshold range.